外观
1. Elasticsearch 查询基础
Elasticsearch 提供了强大的查询 DSL (Domain Specific Language),支持多种查询类型和复杂的搜索需求。
1.1 查询 DSL 概述
1.1.1 查询结构
json
{
"query": {
"查询类型": {
"查询参数": "值"
}
},
"from": 0,
"size": 10,
"sort": [{ "字段名": { "order": "desc" } }],
"_source": ["字段1", "字段2"]
}1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
1.1.2 查询上下文
Elasticsearch 有两种查询上下文:
- 查询上下文 (Query Context): 文档是否匹配查询条件,计算相关性评分
- 过滤上下文 (Filter Context): 文档是否匹配条件,不计算评分,性能更好
上下文选择原则
- 需要相关性排序时使用查询上下文
- 只需要过滤时使用过滤上下文
- 过滤上下文可被缓存,性能更优
1.2 基础查询语法
1.2.1 Match All 查询
json
GET /_search
{
"query": {
"match_all": {}
}
}1
2
3
4
5
6
2
3
4
5
6
返回所有文档,常用于测试或基础统计。
1.2.2 Match 查询
json
GET /logs-2024-01-01/_search
{
"query": {
"match": {
"message": "error connection timeout"
}
}
}1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
- 分析查询字符串,进行分词匹配
- 支持模糊匹配和相关性评分
1.2.3 Term 查询
json
GET /logs/_search
{
"query": {
"term": {
"status": "500"
}
}
}1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
- 精确匹配,不进行分词
- 适用于 keyword、数字、布尔字段
1.2.4 Terms 查询
json
GET /logs/_search
{
"query": {
"terms": {
"status": ["404", "500", "502"]
}
}
}1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
匹配多个精确值中的任意一个。
2. 全文搜索查询
2.1 Match 查询进阶
2.1.1 短语匹配
json
GET /logs/_search
{
"query": {
"match_phrase": {
"message": "connection timeout"
}
}
}1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
要求查询词语的顺序和相邻关系。
2.1.2 模糊匹配
json
GET /logs/_search
{
"query": {
"fuzzy": {
"hostname": {
"value": "webserver",
"fuzziness": "AUTO"
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
支持拼写错误的容忍匹配。
2.1.3 通配符查询
json
GET /logs/_search
{
"query": {
"wildcard": {
"filename": "access.log*"
}
}
}1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
支持 * 和 ? 通配符。
2.2 多字段搜索
2.2.1 Multi Match 查询
json
GET /logs/_search
{
"query": {
"multi_match": {
"query": "database error",
"fields": ["message", "error_details", "stack_trace"]
}
}
}1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
在多个字段中搜索相同内容。
2.2.2 跨字段搜索
json
GET /logs/_search
{
"query": {
"multi_match": {
"query": "nginx 404",
"fields": ["message^2", "hostname"],
"type": "cross_fields"
}
}
}1
2
3
4
5
6
7
8
9
10
2
3
4
5
6
7
8
9
10
考虑字段间的相关性,^2 表示字段权重加倍。
3. 范围和地理查询
3.1 范围查询 (Range Query)
3.1.1 数值范围
json
GET /logs/_search
{
"query": {
"range": {
"response_time": {
"gte": 1000,
"lte": 5000
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
3.1.2 日期范围
json
GET /logs/_search
{
"query": {
"range": {
"@timestamp": {
"gte": "2024-01-01",
"lte": "2024-01-31",
"format": "yyyy-MM-dd"
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
2
3
4
5
6
7
8
9
10
11
12
3.1.3 范围操作符
| 操作符 | 含义 | 示例 |
|---|---|---|
gt | 大于 | {"gt": 100} |
gte | 大于等于 | {"gte": 100} |
lt | 小于 | {"lt": 1000} |
lte | 小于等于 | {"lte": 1000} |
3.2 地理查询
3.2.1 地理位置过滤
json
GET /logs/_search
{
"query": {
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 40.73,
"lon": -74.1
},
"bottom_right": {
"lat": 40.61,
"lon": -73.77
}
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
3.2.2 地理距离查询
json
GET /logs/_search
{
"query": {
"geo_distance": {
"distance": "100km",
"location": {
"lat": 40.715,
"lon": -73.985
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
2
3
4
5
6
7
8
9
10
11
12
4. 布尔查询 (Bool Query)
布尔查询是 Elasticsearch 最常用的复合查询类型。
4.1 布尔操作符
json
GET /logs/_search
{
"query": {
"bool": {
"must": [
{"match": {"level": "ERROR"}},
{"range": {"@timestamp": {"gte": "now-1h"}}}
],
"must_not": [
{"match": {"message": "test"}}
],
"should": [
{"term": {"hostname": "web01"}},
{"term": {"hostname": "web02"}}
],
"filter": [
{"term": {"status": "active"}}
],
"minimum_should_match": 1
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
4.2 操作符详解
4.2.1 Must (必须匹配)
- 所有 must 子句都必须匹配
- 影响相关性评分
- 等同于 AND 操作
4.2.2 Must Not (必须不匹配)
- 所有 must_not 子句都不能匹配
- 不影响相关性评分
- 等同于 NOT 操作
4.2.3 Should (应该匹配)
- should 子句中至少有一个匹配(可配置)
- 匹配的文档评分更高
- 等同于 OR 操作
4.2.4 Filter (过滤)
- 必须匹配,但不影响评分
- 可以使用查询缓存
- 性能优于 must
4.3 嵌套布尔查询
json
GET /logs/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{"term": {"level": "ERROR"}},
{"range": {"response_time": {"gte": 5000}}}
]
}
},
{
"bool": {
"must": [
{"match": {"message": "timeout"}},
{"term": {"service": "api"}}
]
}
}
],
"minimum_should_match": 1
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
5. 聚合分析 (Aggregations)
聚合是 Elasticsearch 的强大分析功能,支持数据统计和洞察。
5.1 基础聚合
5.1.1 指标聚合
json
GET /logs/_search
{
"size": 0,
"aggs": {
"avg_response_time": {
"avg": {
"field": "response_time"
}
},
"total_requests": {
"sum": {
"field": "request_count"
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| 指标聚合 | 说明 | 示例 |
|---|---|---|
avg | 平均值 | 平均响应时间 |
sum | 总和 | 总请求数 |
min/max | 最小/最大值 | 最快/最慢响应 |
cardinality | 去重计数 | 唯一用户数 |
5.1.2 桶聚合
json
GET /logs/_search
{
"size": 0,
"aggs": {
"status_codes": {
"terms": {
"field": "status",
"size": 10
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
2
3
4
5
6
7
8
9
10
11
12
按字段值分组,统计每组文档数量。
5.2 高级聚合
5.2.1 日期直方图聚合
json
GET /logs/_search
{
"size": 0,
"aggs": {
"requests_over_time": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "1h",
"format": "yyyy-MM-dd HH:mm"
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
8
9
10
11
12
13
按时间间隔聚合数据。
5.2.2 范围聚合
json
GET /logs/_search
{
"size": 0,
"aggs": {
"response_time_ranges": {
"range": {
"field": "response_time",
"ranges": [
{"to": 100},
{"from": 100, "to": 1000},
{"from": 1000}
]
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
按数值范围分组聚合。
5.2.3 嵌套聚合
json
GET /logs/_search
{
"size": 0,
"aggs": {
"by_status": {
"terms": {
"field": "status"
},
"aggs": {
"avg_response_time": {
"avg": {
"field": "response_time"
}
},
"error_messages": {
"top_hits": {
"size": 5,
"_source": ["message", "@timestamp"]
}
}
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
在桶聚合基础上进行子聚合。
5.3 管道聚合
5.3.1 导数聚合
json
GET /metrics/_search
{
"size": 0,
"aggs": {
"requests_per_hour": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "1h"
},
"aggs": {
"requests": {
"sum": {
"field": "count"
}
},
"requests_derivative": {
"derivative": {
"buckets_path": "requests"
}
}
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
计算相邻桶之间的差值。
5.3.2 移动平均聚合
json
GET /metrics/_search
{
"size": 0,
"aggs": {
"requests_per_minute": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "1m"
},
"aggs": {
"requests": {
"sum": {
"field": "count"
}
},
"requests_moving_avg": {
"moving_avg": {
"buckets_path": "requests",
"window": 5
}
}
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
计算移动平均值。
6. 查询优化技巧
6.1 性能优化策略
6.1.1 过滤上下文 vs 查询上下文
json
# 推荐:使用 filter 提高性能
GET /logs/_search
{
"query": {
"bool": {
"filter": [
{"term": {"status": "200"}},
{"range": {"@timestamp": {"gte": "now-1h"}}}
]
}
}
}
# 避免:使用 must 降低性能
GET /logs/_search
{
"query": {
"bool": {
"must": [
{"term": {"status": "200"}},
{"range": {"@timestamp": {"gte": "now-1h"}}}
]
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
6.1.2 分页优化
json
# 小数据集:使用 from/size
GET /logs/_search
{
"from": 0,
"size": 100
}
# 大数据集:使用 search_after
GET /logs/_search
{
"size": 100,
"search_after": [1609459200000, "doc_id_100"],
"sort": [
{"@timestamp": "desc"},
{"_id": "asc"}
]
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
6.1.3 字段选择优化
json
GET /logs/_search
{
"_source": ["@timestamp", "level", "message"],
"query": {
"match": {
"message": "error"
}
}
}1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
只返回需要的字段,减少网络传输。
6.2 缓存机制
6.2.1 查询缓存
json
# 强制使用查询缓存
GET /logs/_search
{
"query": {
"bool": {
"filter": {
"term": {
"status": "200",
"_cache": true
}
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
2
3
4
5
6
7
8
9
10
11
12
13
14
6.2.2 分片请求缓存
json
PUT /logs/_settings
{
"index.requests.cache.enable": true
}1
2
3
4
2
3
4
启用索引级别的请求缓存。
6.3 索引优化
6.3.1 预索引常用查询
对于频繁使用的复杂查询,可以预先计算结果:
json
PUT /logs/_mapping
{
"properties": {
"search_terms": {
"type": "keyword",
"index": true
}
}
}1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
6.3.2 字段数据类型优化
json
PUT /logs/_mapping
{
"properties": {
"status": {
"type": "keyword",
"doc_values": false // 不需要聚合时关闭
},
"response_time": {
"type": "scaled_float",
"scaling_factor": 100 // 缩放因子优化存储
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
8
9
10
11
12
13
7. 实际查询案例
7.1 日志分析查询
7.1.1 错误日志统计
json
GET /logs/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{"term": {"level": "ERROR"}},
{"range": {"@timestamp": {"gte": "now-24h"}}}
]
}
},
"aggs": {
"errors_by_hour": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "1h"
}
},
"errors_by_type": {
"terms": {
"field": "error_type",
"size": 10
}
},
"top_error_messages": {
"terms": {
"field": "message.keyword",
"size": 5
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
7.1.2 性能监控查询
json
GET /metrics/_search
{
"size": 0,
"query": {
"range": {
"@timestamp": {
"gte": "now-1h"
}
}
},
"aggs": {
"avg_response_time": {
"avg": {
"field": "response_time"
}
},
"95th_percentile": {
"percentiles": {
"field": "response_time",
"percents": [95]
}
},
"response_time_histogram": {
"histogram": {
"field": "response_time",
"interval": 100
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
7.2 安全分析查询
7.2.1 异常访问检测
json
GET /access_logs/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{"range": {"@timestamp": {"gte": "now-1h"}}},
{"terms": {"status": ["403", "404", "500"]}}
]
}
},
"aggs": {
"suspicious_ips": {
"terms": {
"field": "client_ip",
"size": 20,
"min_doc_count": 10
},
"aggs": {
"status_distribution": {
"terms": {
"field": "status"
}
},
"last_request": {
"top_hits": {
"size": 1,
"sort": [{"@timestamp": {"order": "desc"}}],
"_source": ["@timestamp", "request", "user_agent"]
}
}
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
7.2.2 威胁情报查询
json
GET /logs/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"message": "sql injection"
}
},
{
"terms": {
"client_ip": ["10.0.0.1", "192.168.1.100"]
}
},
{
"fuzzy": {
"user_agent": {
"value": "malicious scanner",
"fuzziness": 2
}
}
}
],
"minimum_should_match": 1,
"filter": [
{"range": {"@timestamp": {"gte": "now-24h"}}}
]
}
},
"sort": [
{"@timestamp": {"order": "desc"}}
]
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
7.3 业务分析查询
7.3.1 用户行为分析
json
GET /user_logs/_search
{
"size": 0,
"query": {
"range": {
"@timestamp": {
"gte": "now-30d"
}
}
},
"aggs": {
"daily_active_users": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "1d"
},
"aggs": {
"unique_users": {
"cardinality": {
"field": "user_id"
}
}
}
},
"user_segments": {
"terms": {
"field": "user_type"
},
"aggs": {
"avg_session_time": {
"avg": {
"field": "session_duration"
}
},
"page_views": {
"sum": {
"field": "page_views"
}
}
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
查询设计最佳实践
- [ ] 明确查询目的和数据范围
- [ ] 选择合适的查询类型和上下文
- [ ] 使用过滤器提高性能
- [ ] 合理设计聚合嵌套层级
- [ ] 控制返回字段和分页大小
- [ ] 利用缓存机制优化性能
7.4 实时监控仪表盘查询
7.4.1 系统状态概览
json
GET /system_metrics/_search
{
"size": 0,
"query": {
"range": {
"@timestamp": {
"gte": "now-5m"
}
}
},
"aggs": {
"current_cpu_usage": {
"avg": {
"field": "cpu_percent"
}
},
"current_memory_usage": {
"avg": {
"field": "memory_percent"
}
},
"active_connections": {
"max": {
"field": "connections"
}
},
"error_rate": {
"avg": {
"field": "error_rate"
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
7.4.2 趋势分析
json
GET /performance_metrics/_search
{
"size": 0,
"aggs": {
"response_time_trend": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "1m"
},
"aggs": {
"avg_response_time": {
"avg": {
"field": "response_time"
}
},
"p95_response_time": {
"percentiles": {
"field": "response_time",
"percents": [95]
}
}
}
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
查询性能注意事项
- 避免在查询时进行深度分页(from + size > 10000)
- 大范围时间查询可能导致性能问题
- 复杂的聚合嵌套会增加计算开销
- 正则表达式查询性能较差,应谨慎使用